Reading � Context-aware papers

Greg Detre

Wednesday, September 11, 2002

 

Sept 9

Creating technology/scenario matches

Out of Context (Lieberman, Selker);
Context Aware Design and Interaction (Selker, Burleson)

Sept 11

Experiment, role-play, mockup, prototype, use, experiment

Primer on Experimental and Quasi-Experimental Design, LAFCam

 

Reading � Dawson, Primer on Experimental and Quasi-Experimental Design

refers mainly to counseling psychology???

The Validity of Experimental Designs

Internal validity

internal validity = �the extent that extraneous variables (error variance) in an experiment are accounted for�

Campbell and Stanley (1963, pg 5): "internal validity is the basic minimum without which any experiment is uninterpretable"

�8 major threats to internal validity:

(a)    history, encompassing the environmental events occurring between the first and second observations in addition to the independent variable(s);

(b)   maturation, which refers to the processes within the participants (psychological and/or biological) taking place as a function of the passage of time, not attributable to the independent variable(s);

(c)    testing, which is sensitization to the posttest as a result of having completed the pretest;

(d)   instrumentation, which refers to deterioration or changes in the accuracy of instruments, devices or observers used to measure the dependent (outcome) variable;

(e)    statistical regression, which operates when groups are selected on the basis of their extreme scores, because these anomalous scores tend to regress toward the mean on repeated testing;

(f)     selection, which refers to the factors involved in placing certain participants in certain groups (e.g., treatment versus control), based on preferences;

(g)    mortality, which refers to the loss of participants and their data due to various reasons, e.g., death or sickness;

(h)   interactions of previous threats with selection. For example, a selection-maturation interaction results when the experimental groups are maturing at different rates based on the selection of the participants (Campbell & Stanley, 1963).

(i)      In later writings, Cook and Campbell (1979) identify an additional threat to internal validity. This is ambiguity about the direction of casual influence when all other plausible third-variable explanations have been ruled out of the A-B relationship, but it remains unclear as to whether A causes B, or B causes A.�

External validity

�Which populations, settings, treatment variables and measurement variables can these results be generalized to?�

�Tests that do meet the representativeness criteria are, in essence, tests of statistical interaction.�

�example statistical interaction threats to external validity:

interaction of selection and treatment

e.g. if there is an interaction between a therapeutic treatment and ethnicity, then it can not be decisively stated that the treatment holds true across different ethnicities. When effects of differing magnitude exist, the researcher must delineate when and where the effect holds, and when and where it does not)

interaction of setting and treatment

e.g., can a casual relationship obtained on a military installation also be obtained on a university campus?). The last interaction is between history and treatment. In this case, the question involves to which period of the past or future can the results obtained be generalized. For example, the majority of experiments take place on university campuses, with undergraduate university students as participants. If an experiment was conducted on the day after a football loss to this university's arch rival, then the results may not generalize even to a week after the loss, much less beyond the participants and setting represented in the original study.�

two additional threats to external validity:

interaction of treatments with treatments (i.e. multiple parallel treatments)

interaction of testing with treatment � where the testing changes the effectiveness of the treatment

 

From now on, they use an "X" to represent the exposure of a group to an experimental treatment or event. An "O" will signify some type of observation or measurement. The Xs and Os in the same row will refer to the same group, and the order of the characters from left to right will designate the temporal order of the events. "R" will exemplify random assignment, if necessary

Three pre-experimental designs

�Preexperimental designs are those in which there is no control group and/or have comparison groups that are formed nonrandomly, therefore yielding results which are difficult to interpret�, e.g.:

The one-shot case study

a single group was studied only once after a treatment was applied

X O

Campbell & Stanley, 1963:

"...such a total absence of control as to be of almost no scientific value"

"securing scientific evidence involves making at least one comparison"

The one-group pretest-posttest design

O1 X O2

�In this design, history is one of the uncontrolled rival hypotheses, as the changes between O1 and O2 may have been due to events that possibly occured in addition to the experimenter's X�

testing and maturing are amongst the many other such threats to internal validity

The static group comparison

�a posttest is administered to two groups, one having been administered the X, and the other not (a control group)�

X O

O

�status of the two groups prior to the administration of X, since the participants are not randomly assigned to the two groups�

problems then with selectivity and selective drop-out, for instance

Three true experimental designs

random assignment is utilized, therefore reducing the amount of potential threats to internal validity

Pretest-posttest control group design

R O1 X O2

R O3 O4

differences attributable to history, instrumentation, maturation and testing between pretest and posttest should be the same in both groups, and so accounted for

most of the other threats to internal validity should be protected by the random assignment of participants, occurring probably equally across the two groups

�Ironically, the major weakness of this design is in fact, its major strength, but for external validity reasons. The pretest would sensitize both the control group and the experimental group to the posttest in a like manner, therefore presenting no internal threat to validity. However, generalizing the results of a treatment that included a pretest in the design, to a different sample without a pretest, may yield much different results (Heppner et al., 1992).�

The posttest-only control group design

R X O1

R O2

randomisation, but without pretest

�internal validity of this design is basically solid�

�the posttest-only control group design is the prototypical experimental design, and most closely exemplifies a condition in which a casual relationship can be discerned between an independent and dependent variable� (Cook & Campbell)

�Another problem deals with the absence of a pretest employed to reduce variability in the dependent variable�

this should be accounted for by randomisation, but:

�(a) many researchers have a very loose definition of what randomization is, and (b) true randomization carries with it very stringent criteria and many researchers are unaware of the necessary precision and falsely believe they have true randomization, when they do not�

The Solomon four-group design

R O1 X O2

R O3 O4

R X O5

R O6

when a pretest is desired, but there is concern over the effects of using a pretest

combination of the pretest-posttest control group design (the first two groups), and the posttest-only control group (the last two groups)

future replicability:

�accounts for the problem that the pretest-posttest control group design has, by comparing O2 to O5 to account for pretest sensitization, the only difference being that O2 receives a pretest prior to treatment

with regard to generalizability, the researcher can compare O2 to O4 and O5 to O6. If treatment effects are found in both cases, the results will be considered strong, and suggest future replicability, as one replication is confirmed with the data in hand�

requires time, energy and resources

Three Quasi-Experimental Designs

For when a true experimental design is not available to a researcher for various reasons, e.g., in clinical settings where intact groups are already formed, when treatment can not be withheld from a group, or when no appropriate control or comparison groups are available

The major difference between true and quasi-experimental designs is the random assignment of participants

Nonequivalent-groups designs

most frequently used quasi-experimental design (because it�s often interpretable)

similar to the pretest-posttest control group experimental design considered earlier, but with non-random groups

Non-R O1 X O2

Non-R O3 O4

accounts for many of the threats to internal validity, except for:

(a)    selection-maturation � �As stated earlier, many researchers falsely believe that the administration of a pretest remedies the nonrandom assignment of participants, and use ANCOVA to "level" the groups. As has been succinctly pointed out by Loftin and Madison (1991), applying an ANCOVA does not always make groups equal. Furthermore, using a no-difference null hypothesis based on pretest scores is faulty logic as any fail-to-reject decision when testing any Ho does not justify believing that the null hypothesis is true�

(b)    instrumentation

(c)    differential statistical regression,

(d)    interaction of selection and history

Cohort designs

designs are typically stronger than nonequivalent-groups design because cohorts are more likely to be closer to equal at the outset of the experiment, e.g. TAMU freshman in 1995 versus TAMU freshman in 1996

O1

X O2

the most obvious problems with this design are with the passage of time between the two cohorts, and the nonrandom assignment of participants to the cohort

Time-series designs

characterised by multiple observations over time

in the interrupted time-series design (the most basic of this class) a treatment is introduced at some point in the series of observations

O1 O2 O3 O4 X O5 O6 O7 O8

�observe over time, any difference after the treatment is implemented to discern if there is a continuous effect versus a discontinuous effect

a continuous effect would be treatment effects that remain stable after the initial discontinuity produced by the intervention.

a discontinuous effect would be a result that decays over time.

this design also accounts for effects that are instantaneous, versus delayed in their manifestations (Cook & Campbell, 1979), i.e., with repeated observations after the treatment is implemented, the researcher can ascertain how quickly the effects are initiated.�

major threat to internal validity in this design is history, i.e., that variables other than the treatment under investigation came into play immediately after introduction of the treatment

also problems with instrumentation and selection

Reading � Lockerd & Mueller, LAFCam

uses affective data to improve video recording

this might be particularly useful for home videos, where the cameraman is likely to be the editor, so heightened emotion might indicate important sequences to be preserved/edited later

based on noise of cameraman laughing, galvanic skin response � also records cameraman�s face

 

Discarded

And even if the LAFCam II included a complete fMRI machine, it would still suffer from major limitations.

how marketable do I think this is???

also, laughter tends to involve a build-up which culminates in a sort of climactic moment � the software would presumably include the previous 5 seconds or so as well

could you not use spoken input from the cameraman as a sign that he�s engaging with the scene??? however, although sometimes it�s what he�s saying that is most important, sometimes he�s just chatting or muttering, and sometimes he�s maybe lining people up for the shot (so you want to cut till he stops talking) � requiring the system to figure out which�

 

Questions

Primer on Experimental and Quasi-Experimental Design

counseling psychology

are the 8 major threats to internal validity mainly specific to counseling psychology??? can I think of some that aren�t???

static group comparison � why not assign the participants to the control and testing group randomly??? is this all that precludes it from being an experimental design???

what are the controversies surrounding the use of pretests after random assignment (Heppner, Kivligham, & Wampold, 1992)???

�The main weakness of [the posttest-only control group] design concerns external validity, i.e., the interaction of selection and treatment� � but it utilises randomisation???

don�t understand the discussion of the selection-maturation effect in the non-equivalent groups design, esp re ANCOVA??? why doesn�t the administration of a pretest remedy the nonrandom assignment of participants???

LAFCam

�Hidden Markov Models (HMMs), one for laughter and one for all other speech using Expectation Maximization of spectral coefficients of the audio signal�

is this system really doing very much more than just some sort of button that allows you to tag a really good sequence manually??? that requires the cameraman to remember, but would work pretty well � does this exist???

you could perhaps use face recognition (like Cog�s) to track people close-up as they move through a fixed shot

has any work been done on which emotion arousal indicators correspond to the shots people want to keep??? this appears to be aimed at family home videos, where laughter is probably the best signal of emotional involvement, but would it pick up an moment of appreciation of aesthetic beauty (which entirely lacks humour)???

am guessing that the emotion sensors used are particularly cheap and cheerful, as well as non-invasive � what further options exist???

Or perhaps directors could be taught to subconsciously raise or lower some arbitrary physiological response in proportion to their satisfaction with a given shot, i.e. could editors learn to heighten their emotional response when you get a good shot?